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ABSTRACT 

In electroneurophysiology, single-trial brain responses to a 
sensory stimulus or a motor act are commonly assumed to 
result from the linear superposition of a stereotypic event- 
related signal (e.g. the event-related potential or ERP) that 
is invariant across trials and some ongoing brain activity 
often referred to as noise. To extract the signal, one per- 
forms an ensemble average of the brain responses over many 
identical trials to attenuate the noise. To date, this simple 
signal-plus-noise (SPN) model has been the dominant ap- 
proach in cognitive neuroscience. Mounting empirical evi- 
dence has shown that the assumptions underlying this model 
may be overly simplistic. More realistic models have been 
proposed that account for the trial-to-trial variability of the 
event-related signal as well as the possibility of multiple dif- 
ferentially varying components within a given ERP wave- 
form. The variable-signal-plus-noise (VSPN) model, which 
has been demonstrated to provide the foundation for sepa- 
ration and characterization of multiple differentially varying 
components, has the potential to provide a rich source of in- 
formation for questions related to neural functions that com- 
plement the SPN model. Thus, being able to estimate the 
amplitude and latency of each ERP component on a trial-by- 
trial basis provides a critical link between the perceived ben- 
efits of the VSPN model and its many concrete applications. 
In this paper we describe a Bayesian approach to deal with 
this issue and the resulting strategy is referred to as the differ- 
entially Variable Component Analysis (dVCA). We compare 
the performance of dVCA on simulated data with Indepen- 
dent Component Analysis (ICA) and analyze neurobiologi- 
cal recordings from monkeys performing cognitive tasks. 

1. INTRODUCTION 

Relations between brain and behavior are often studied by 
recording event-related potentials (ERPs) over repeated pre- 
sentations of a sensory stimulus or task performance. Each 
task performance is referred to as a trial. On a single-trial 
basis, the traditional model holds that the data from a given 
electrode (channel) is the linear superposition of an event- 
related potential (ERP), which is considered the signal hav- 
ing a characteristic waveform whose amplitude and latency 
stay the same each time the event is repeated on multiple tri- 
als, and ongoing background activity (noise). We refer to this 
model as the signal plus noise (SPN) model and represent it 
mathematically as [1]. 

Xr(t)=s(t) + T] r (t) (1) 
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where x r (t) is the EEG recording from the r^ 1 trial, s(t) is 
the ERP, and rj r (f) is the ongoing noise process. The empir- 
ical evidence suggests that the SPN model is overly simplis- 
tic and a more realistic model should capture the trial-to-trial 
variability in amplitude and latency of the event-related sig- 
nals and the existence of multiple components with differen- 
tial variability in their single-trial amplitudes and latencies. 
A model that incorporates all these features, which we call 
the variable signal plus noise (VSPN) model, can be written 
as [2] 

N 

x r (!) = X a ^ s n 0 “ ?nr) + 7] r (t) (2) 

n= 1 


where s n (t) is the rfi 1 event-related component waveform 
with trial-to-trial variable amplitude and latency given by (V 
and T nr , respectively, and N is the total number of compo- 
nents. In many experimental paradigms, investigators use 
multiple detectors positioned at different locations to mea- 
sure the event-related activities of neural ensembles. In this 
case, there is a need to consider the specific neural ensem- 
ble whose activity is responsible for a given component that 
appears across all detectors. We call such an ensemble the 
source or the generator for the component. The degree to 
which a detector records a signal from a particular source 
depends on many factors including the relative position and 
orientation of the source relative to the detector. To describe 
this source-detector interaction, we introduce a coupling ma- 
trix C, where the matrix element Cmn describes the degree 
to which the nfi 1 detector (m = 1,2, ... ,M) detects the rfi 
source. This coupling matrix is known as the mixing matrix 
in the source separation literature and as the lead-field ma- 
trix in electroneurophysiology. For the fi 1 recorded trial we 
model the signal x{t) recorded by the rrfi 1 detector in com- 
ponent form as 


N 


x mr(t ) — ^ CmnCCm-Snit — T nr ) -f Tj mr (t) 
n=l 


(3) 


where the notation has the same interpretation as before (see 
Equation (2)). For the purpose of this work we will refer to 
both the single sensor model in (2) and multi-sensor model 
in (3) as the VSPN model. The goal of this work is to present 
a Bayesian inference framework for the estimation of all the 
parameters in the VSPN models and to then use the trial-to- 
trial variability information of the ERP components to gain 
understanding of neural functions. 




2. METHODS 

For simplicity we consider in more detail the single channel 
VSPN model in (2). Bayesian inference for the parameters 
in the VSPN model in (3) can be similarly formulated. The 
posterior probability density function (PPDF) is formulated 
as 


where D stands for data. 

Intuition about the characteristics of the MAP solution 
can be gained by examining the partial derivatives of the log- 
arithm of the PPDF with respect to each of the model param- 
eters. This leads to a practical and simple estimation algo- 
rithm. If we let 


p(s,a,z, d v jx,7) p( s , a, z, (r)|/) x 

gW s.q.T.fyCO,/) 

P(x|/) W 

where the boldface parameters denote the set of parameters 
for all the components over the whole ensemble of trials. 
The parameter 6 V (t) denotes the parameters for the ongo- 
ing process and p( s, ct, T, 6^ (t) |/) is the prior probability for 
the model parameters. For this additive model, the likeli- 
hood p(x |s, a, t, 6>rj (f), /) is given by the probability model 
of the ongoing activity, i.e., />(rj(f)|/). In the absence of 
precise knowledge about the temporal structure of the on- 
going activity, we assign rj (t) to be independent identically 
distributed with a (unknown) time-independent variance cr^ 
and zero mean. In this way, Equation (4) is rewritten as 

p(s, a, t, a n |x, /) <*= p(s, a, z, cr n |/) 

Under the constraint of a given mean and crj, and following 
the principle of maximum entropy [3], a Gaussian density 
is assigned to the likelihood function. After dropping the 
normalization term /?(x|/) _1 , the PPDF can be rewritten as 


p(s,a,T,(T r ,|x,/)«p(s,a,r,CT^|/)x ^ 2na x 

12 

*^r(0 “ ^n=l &nrSn{t ~ T nr ) 
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where R is the total number of trials, T is the total number 
of sampled data points in a given trial and other parameters 
have the same interpretation as before. For simplicity we 
take the prior distributions of a nr , r nn and 5 n (r) to be uni- 
form, with appropriate cutoffs reflecting physiologically rea- 
sonable ranges of values. Treating the variance of the ongo- 
ing activity as a nuisance parameter and assigning the Jef- 
freys prior picr^ll) — a' 1 [3], we marginalize the posterior 
over a v and obtain 



-X?=1 23L 


p(s,a,T|x,/)ocp( s ,a,Tl/)x 
if T r N 

S “ T «r) 
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A thorough evaluation of the PPDF and computation of its 
moments can be obtained via Markov Chain Monte Carlo 
(MCMC) methods [3]. Here we summarize the PPDF by 
seeking the Maximum a Posteriori (MAP) solution, i.e., a 
set of parameters that maximize the PPDF. Specifically, the 
MAP solution for the collection of model parameters M is 

M — arg max[ln(/?(Af|/)) + ln(p(D\M i I))] (7) 

M 



R T 


r=l/=l 


N 1 2 

x r { 0 — ^ &nr s n{t ~ % nr ) 
n=I 


( 8 ) 


logarithm of the posterior can be simply written as 


RT , „ 

InP = — — In Q 4- const , 


(9) 


where P stands for PPDF. Setting partial derivatives against 
the unknown single-trial parameters to zero we can solve the 
resulting equations in an algorithmic fashion [2]. 

The MAP solution above leads to a simple iterative fixed 
point algorithm. After proper initialization, at each iteration 
step the parameters for all components are updated in se- 
quence: first the latencies, then the waveforms, and finally 
the amplitudes. Specifically, let $*■(/), aj r , Tj r denote the es- 
timated values of the parameters of the f h source during the 
P h trial in the i th iteration. To avoid degeneracy in the model, 
the averages of the amplitude and latency values in each it- 
eration are constrained to (ocj r ) r = 1 and (rj r ) r = 0. In this 
way, if there is no trial-to-trial variability both in amplitude 
and latency, the superposition of the estimated component 
waveforms should equal to the simple average of single trial 
time series referred to here as AERP. For a single channel 
data set x, the algorithm consists of the following steps: 




At 7?2 — 0, the initial guess for the amplitudes and time 
offsets are set to a?} r = 1, rj r = 0. For simplicity, the de- 
cision on the number of components N is based on the in- 
spection of the AERP. Similarly, N non-overlapping seg- 
ments of the AERP are taken as the initial guesses for 
the N components’ waveforms . After this initialization, 
each iteration consists of four steps: 

- For a given trial, estimate the latency 

one component at a time, according 

to Tjy 1 = argmaxp { (r), with p(t) — 




a jrSj{t ~ ?) — 'Ln=l t n^j a nrS n (j “ T 

starting from the first and proceeding up to the N* h 
component. 

- Estimate the waveforms according to: 


*1(0 = 
1 u 


with W = Xr(t + Tj+') - 


For all the trials and components, estimate the am- 


plitudes according to: o£r = 


M _ X=1 W 


, with U — 


lf=iV 2 

x r (t) - SjLj ^jU l m s‘+\t-z l + l ) and V = ^ +1 (t - 


r* +1 

jr 
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- Increment the iteration index: i = i + 1 ; Repeat 2 
through 4 for K iterations until convergence or for 
a pre-decided number of iterations. 


No Variability 


Variability 




Figure 1 : (Top) The source waveshape results for extended 
ICA, and dVCA in the single-trial analysis cases. Only the 
dVCA analysis of the variable responses results in accurate 
identification of the underlying source waveshapes (box). 
(Bottom) The associated Amari errors for the cases above. 


3. COMPARISON TO ICA 

The dVCA algorithm shares many similarities to Infomax 
ICA [4], since in both cases, the sources are assumed to 
be linearly mixed. The greatest difference is that the sig- 
nal model used in dVCA is more specialized than the linear 
mixing model used in ICA; it allows for the sources to vary 
in specific ways from trial to trial. Second, dVCA solves for 
all the parameters, whereas ICA marginalizes over the source 
waveshapes [6, 7, 2]. Third, ICA through the use of the logis- 
tic function makes assumptions about the amplitude density 
of the source waveshapes [5, 6], whereas dVCA makes no 
such specific assumptions. 

The greatest strength of dVCA arises due to the fact that 
it explicitly models trial-to-trial variability. Here we briefly 
compare dVCA to ICA, and show that dVCA outperforms 
ICA when trial-to-trial variability is a significant factor in 
the data. We simulated electric field potentials recorded 
from a linear- array multielectrode with 15 channels spanning 
the cortical laminae in V 1 by designing three synthetic ERP 
components sampled at 2 kHz to approximate the neural en- 
semble response to diffuse red-light stimulation in macaque 
V 1 (see next section). Two synthetic data sets were used to 
perform the comparison; both of which were contaminated 
by low amplitude additive Gaussian noise (component SNRs 
of 1.2dB, ~6.1dB, and 4 .5dB). The first data set exhibited 


neither amplitude nor latency variability. The second data 
set exhibited log-normally distributed amplitude variability 
with sample mean jU = 1.0 and sample standard deviation 
cr = 1.0, normally-distributed latency variability with sample 
mean ^ = 0.0 ms and sample standard deviation a — 10.0ms. 

ICA was performed by applying the Infomax ICA algo- 
rithm in the EEGLAB toolbox [8]. The runica.m algorithm 
was used with the ‘extend’ option (extended ICA), which al- 
lows ICA to model sub-Gaussian as well as super-Gaussian 
sources. Since each data set has 15 channels, ICA esti- 
mates 15 source waveshapes. The three estimated source 
waveshapes with highest correlation to the three original 
sources were chosen for analysis while the remaining esti- 
mated sources were ignored. 

The data sets were analyzed by ICA in two different 
ways: first by averaging the epochs and analyzing the av- 
erage response (average analysis), and second by treating the 
data as a long string of concatenated single-trial responses 
(single-trial analysis). Figure 1 (top) shows the source wave- 
shape results for extended ICA, and dVCA in the single-trial 
analysis cases. Only the dVCA analysis of the variable re- 
sponses results in accurate identification of the underlying 
source waveshapes. The Amari errors [91 in Figure 1 (bot- 
tom) quantify the degree to which the components were cor- 
rectly identified in the four cases. Since dVCA can only 
analyze single-trial responses, the dVCA analysis consisted 
of analyzing the single-trial cases only resulting in only two 
bars in the bar graph. Note that dVCA performs poorly when 
there is no trial-to-trial variability. However, the presence 
of trial-to-trial variability dramatically improves the perfor- 
mance of dVCA enabling it to surpass extended ICA in sep- 
aration quality. 

4. APPLICATION TO EXPERIMENTAL DATA 

Two male macaque monkeys were trained to perform an 
intermodal selective attention task [10]. Streams of inter- 
digitated auditory and visual stimuli were presented with 
an irregular interstimulus interval (ISI) (minimum of 350 
ms). The attended modality was alternated across trial blocks 
while the physical stimuli remained the same. The eye posi- 
tions were carefully monitored and controlled within a prede- 
fined window for both attending conditions. In both auditory 
and visual modalities, two types of stimuli were presented: 
a ‘standard’ which occurred 86% of the time and a ‘deviant’ 
(created by changing the intensity of the standard stimuli), 
which was presented on 14% of the trials. The visual stimuli 
were diffusive light flashes subtending a 12-degree visual an- 
gle centered at the point of gaze. Attention to a given sensory 
modality was maintained by requiring the subject to make a 
lever-release response to the deviant stimuli while ignoring 
all stimuli in the other modality. The level of performance 
accuracy was around 92% and difficulty of discrimination 
was equated across the two modalities. A linear array elec- 
trode with 14 contacts (150 micron spacing) spanning all six 
cortical laminae was inserted in different parts of the visual 
systems. Laminar LFPs were sampled at 2 kHz. Current 
source density profiles were computed by taking the second 
spatial derivative of the laminar LFPs. Here only neural ac- 
tivity from VI in response to the standard visual stimulus 
under two different attending conditions will be considered 
to examine the effect of attention. 

A key question in the research on the neural correlates of 
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Figure 2: (Top) 14 averaged ERPs along the depth elec- 
trode for the attending- visual (att-V) condition (red) and the 
attending-auditory (att-A) condition (blue). (Bottom) The la- 
tency distributions of the responses in the two conditions. 


attention is whether early neural responses in primary sen- 
sory areas are modulated by attention. Reports on this is- 
sue in the primate visual attention literature are conflicted 
[11, 12], with some reporting attention modulation starting 
from the response onset while others contending that atten- 
tional effect lagging response onset by a substantial time dif- 
ference [10]. While the main technique previously used is the 
grand averaging method, we believe that single-trial parame- 
ters may provide a greater sensitivity to attentional effects in 
different experimental paradigms. 

In Figure 2 (top) we display the superposition of all 14 
averaged ERPs along the depth electrode for the attending- 
visual condition (red) and the attending-auditory (ignoring- 
visual) condition (blue). No statistically significant differ- 
ences are observed for the first component from 50 ms to 100 
ms (i.e. early visual processing). To examine whether the 
single trial parameters are modulated by attention we mea- 
sured single trial latencies by the dVCA method for the first 
evoked component. The distributions of the trial-by-trial la- 
tencies are shown in Figure 2 (bottom). The distributions for 
the two conditions are significantly different (p < 0.01). In 
particular, when the monkey attended the visual modality, a 
narrower and more focused latency distribution is seen (Fig- 
ure 2 bottom right) when compared to the distribution from 
attending-auditory trials, suggesting that on a trial-by-trial 
basis, the timing of the evoked component becomes more 
reliable. The narrower latency distribution of the first com- 
ponent is often accompanied by smaller trial-by-trial ampli- 
tudes (results not shown). On simple averaging this narrow 
latency distribution is offset by the smaller amplitude distri- 
bution and the result is an AERP that is not different from the 
ignoring- visual condition. The reason why the evoked com- 
ponent has a smaller amplitude for attending-visual trials is 
not known. 
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